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Foreword 


This  paper  was  presented  to  the  International 
Working  Conference  on  Model  Realism,  which  was  held 
at  Bad  Honnef,  West  Germany,  April  20-23,  1982. 

The  conference  dealt  with  the  modeling 
aspects  of  different  systems  theories  and 
approaches.  To  get  a  common  basis  for  discussion, 
three  problem  studies  were  introduced.  This  paper 
was  a  contribution  to  problem  study  2: 
Reorganization  in  a  Socio-Technical  System,  which 
postulated  a  distributed  data  base  system  with 
communication  facilities  for  data  transfer. 


SOME  MATHEMATICAL  METHODS  FOR  MODELING  THE  PERFORMANCE 
OF  A  DISTRIBUTED  DATA  BASE  SYSTEM 


C.  Bernard  Barfoot 
Center  for  Naval  Analyses 
2000  North  Beauregard  Street 
'  Alexandria,  Virginia  22311  U.S.A. 

‘This  paper  presents  some  mathematical  methods  for  evaluating  the 
performance  of  a  distributed  data  base  system  (DDBS).  Performance  is 
measured  by  the  speed  and  accuracy  with  which  data  are  transmitted  from 
one  location  to  another.  Five  techniques  are  described:  the  Data  Flow 
Model,  a  semi -Markov  model  for  determining  the  spatial  and  temporal 
distribution  of  data  that  are  to  be  transferred  from  one  location  to 
another;  Optimal  Sample  Size  Estimation,  a  method  for  determining  the 
amount  of  data  to  be  collected  for  input  to  the  Data  Flow  Model; 
Confidence  Interval  Estimation,  a  method  for  estimating  confidence 
intervals  for  the  ouputs  of  the  Data  Flow  Model;  Sensitivity  Estimation 
a  method  for  estimating  the  sensitivity  of  DDBS  performance  to  changes 
in  the  parameters  of  the  Data  Flow  Model;  and  the  Aggregation  of 
Stratified  Semi -Markov  Processes,  a  method  for  combining  semi -Markov 
Data  Flow  Models  developed  for  subsystems  (e.g.,  geographic  regions) 
into  a  single  model  that  is  representative  of  the  entire  DDBS. 

INTRODUCTION  ’  ^ 

An  important  consideration  in  establishing  a  distributed 
(decentralized)  data  base  system  (such  as  described  in  problem 
study  2:  Reorganization  in  a  Socio-Technical  System)  is  a  method  for 
evaluating  the  performance  of  the  system  after  it  has  been 
implemented.  Distributed  Data  Base  Systems  (DDBS)  often  do  not  perform 
as  planned:  data  transfers  take  too  long,  mistakes  are  made  in  updates 
or  data  bases  contain  errors. 

In  this  paper  we  describe  some  mathematical  methods  for  modeling 
the  performance  of  a  distributed  data  base  system.  These  methods  were 
developed  and  used  in  an  evaluation  of  the  U.S.  Marine  Corps'  Joint 
Uniform  Military  Pay  System/Manpower  Management  System  (JUMPS/MMS)  [1]. 

The  particular  aspects  of  performance  we  wish  to  evaluate  are  the 
speed  and  accuracy  with  which  data  are  transmitted  from  one  location  to 
another.  To  do  this  we  must  measure  the  propagation  of  errors  in  the 
DDBS  and  the  delays  and  losses  of  data  when  transmitted  from  one 
location  to  another.  Specific  questions  that  we  want  to  answer  in  the 
evaluation  are  these: 

1.  What  percentage  of  data  is  transmitted  from  one  location 
(source)  to  another  (destination)  without  loss  or  error? 

2.  How  long  do  these  transmissions  take? 

3.  What  happens  to  data  that  are  not  successfully 
transmitted  from  source  to  destination? 


4.  How  ouch  will  Che  percentage  of  successful  transmissions 
increase  if  a  decrease  is  made  in  a  particular  loss/error 
rate? 

5.  How  much  will  the  total  transmission  time  decrease  if  a 
decrease  Is  made  in  the  delay  at  a  particular  location? 

6.  What  degree  of  confidence  do  we  have  in  our  results? 

For  the  purposes  of  our  evaluation  we  assume  that  the  DDBS  can  be 
modeled  as  a  stochastic  network  in  which  the  data  flow  from  one  location 
to  another.  More  specifically,  we  assume  that,  by  an  appropriate 
definition  of  states,  the  flow  of  data  can  be  modeled  as  a  semi -Markov 
process  with  two  types  of  states:  (1)  transient  states,  which  are 
occupied  temporarily  by  data  before  they  move  to  another  state,  and  (2) 
absorbing  states,  which  are  terminal  points  for  data,  since  once  reached 
they  are  never  left  (at  least  for  the  purposes  of  our  evaluation).  The 
absorbing  states  of  the  model  are  the  Intended  destination  of  the  data 
and  the  "data  error  or  data  loss"  states  associated  with  the  transient 
states  of  the  process. 

To  illustrate  these  ideas  before  proceeding,  consider  the  simple 
state/flow  diagram  in  figure  1.  The  dark  lines  in  this  illustrative 
example  indicate  what  we  call  the  primary  flow,  which  is  the  path  that 
data  are  supposed  to  take  from  source  (state  S^)  to  destination 
(state  S7)  through  Intermediate  state  S2»  The  light  lines  indicate  the 
secondary  flow,  which  are  paths  taken  by  data  that  are  either 
misdirected,  erroneous,  or  lost. 


Error  or 
loss 


FIG.  1:  STATE  TRANSITION  DIAGRAM 


As  indicated  in  the  diagram,  we  assume  that  errors,  losses,  and 
delays  can  occur  at  each  transient  state  of  the  process.  Data  that  are 
erroneous  or  lost  are  modeled  as  transitioning  to  an  absorbing  state, 
such  as  S^,  Sc,  or  Sg.  Only  error-free  data  move  from  one  transient 
state  to  another,  or  to  the  destination. 

In  the  remainder  of  this  paper  we  describe  the  following 
analytical  methods  for  evaluating  DOBS  performance: 

e  Data  Flow  Model,  a  semi-Markov  model  consisting  of  q 
transient  states  and  r  absorbing  states.  Estimates  of 
the  transition  probabilities  and  times  are  assumed  to  be 
based  on  sample  data.  The  outputs  of  the  model  are 
estimates  of  (1)  the  distribution  of  source  data  among  the 
absorbing  states  and  (2)  the  time  between  source  and 
destination. 

•  Confidence  Interval  Estimation,  a  method  for  estimating 
confidence  Intervals  for  the  outputs  of  the  Data  Flow 
Model,  whose  inputs  are  estimates,  based  on  sample  data, 
of  the  transition  probabilities  and  times. 

•  Optimal  Sample  Size  Estimation,  a  method  for  estimating 
the  amount  of  data  to  be  collected  at  each  transient  state 
so  as  to  obtain  a  given  level  of  confidence  in  the  output 
of  the  Data  Flow  Model  at  minimum  cost  of  data  collection. 

e  Sensitivity  Estimation,  a  method  for  estimating  the 
sensitivity  of  DDBS  performance  to  changes  in  the 
parameters  of  the  Data  Flow  Model. 

•  Aggregation  of  Stratified  Semi -Markov  Processes,  a  method 
for  combining  semi-Markov  Data  Flow  Models  developed  for 
subsystems  (e.g.,  geographic  regions)  into  a  single  model 
that  is  representative  of  the  entire  system. 

DATA  FLOW  MODEL 

The  flow  of  data  in  the  DDBS  is  modeled  as  a  finite,  stationary, 
absorbing,  semi -Markov  process.  The  inputs  to  the  model  are  the 
transition  probabilities  p^j  from  state  S^  to  state  Sj  and  the  delay 
time  t^  at  each  transient  state  S^.  The  outputs  of  the  model  are  (1) 
the  probability  that  an  item  of  data  reaches  its  intended  destination; 
(2)  the  probability  that  an  item  is  erroneously  changed  or  lost  at  each 
transient  state;  and  (3)  the  mean  time  and  the  distribution  of  time  for 
an  item  of  data  to  reach  its  intended  destination. 
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Let  P  •  (p|j)  be  the  matrix  of  true  (but  unknown)  transition 
probabilities.  P  can  be  partitioned  into  suboatrlces 


where  the  submatrix  Q  contains  the  transition  probabilities  between 
transient  states,  the  submatrix  R  contains  the  transition  probabilities 
from  transient  states  to  absorbing  states,  the  zero  submatrix  0  contains 
the  transition  probabilities  from  absorbing  to  transient  states,  and  the 
identity  submatrix  I  contains  the  transition  probabilities  between 
absorbing  states. 

The  expected  number  h^j  of  times  the  process  is  in  transient  state 
Sj,  given  that  it  originated  in  transient  state  S^,  is  [2] 

(h1j)  -  H  -  (I  -  Q)_1  . 

The  probability  b^j  that  the  process  terminates  in  absorbing  state 
Sj,  given  that  it  was  initially  in  state  S^,  is  [2] 

(bt j )  -  B  -  (I  -  Q)-1R  -  HR  . 

The  matrix  B  provides  the  answers  to  questions  1  and  3  of  our 
evaluation  objectives.  From  it  we  can  determine  the  percentage  of  data 
that  reaches  the  destination  without  loss  or  error  and  the  percentage 
that  is  lost  or  erroneously  changed  at  each  transient  state  of  the 
process. 

If  the  process  terminates  at  absorbing  state  S^,  the  expected 
number  of  times  that  the  process  was  in  transient  state  Sj,  given  that 
it  originated  in  transient  state  S^,  is  [2] 

(m1j)  -  M  -  D-1HD 


*  where  D  is  a  diagonal  matrix  whose  diagonal  elements  are  the  probability 
bj^  that  a  process  reaches  absorbing  state  S^  given  that  It  started  in 
transient  state  Sj. 

If  the  process  terminates  at  absorbing  state  S^,  the  expected  time 
between  arrival  at  transient  state  Sj  and  absorption  in  state  is 

(Vj)  •  V  ■  MI  (1) 

where  T  ■  (tj)  is  the  column  vector  whose  components  are  the  expected 
times  the  process  is  in  each  transient  state  Sj. 

The  vector  V  provides  part  of  the  answer  to  question  2  of  our 
evaluation  objectives.  Prom  it  we  can  determine  ,  the  expected  time 
from  source  (state  S^)  to  destination  by  designating  absorbing  state 
as  the  destlnaton.  The  distribution  of  time  from  source  to  destination 
is  obtained  by  simulation  rather  than  by  analytic  methods,  unless  the 
individual  delay  times  are  exponentially  distributed. 

CONFIDENCE  INTERVALS  FOR  B 

We  assume  that,  in  practice,  the  inputs  to  the  Data  Flow  Model 
(the  transition  probabilities  and  delay  times)  must  be  estimated  from 
sample  data  and,  therefore,  are  known  only  with  a  certain  degree  of 
accuracy.  Consequently,  the  outputs  of  the  Data  Flow  Model  (B  and  V) 
are  likewise  known  only  with  a  certain  degree  of  accuracy. 

In  this  section  we  present  a  method  for  estimating  confidence 

A 

Intervals  for  the  matrix  B  (an  estimator  for  B)  when  the  transition 
probabilities  are  based  on  sample  data.  Similar  methods,  described  in 
[1],  can  be  used  for  estimating  confidence  intervals  for  V,  an  estimator 
for  V. 

A 

The  Variance  of  B 

Let  be  the  number  of  observations  at  transient  state  S^  and  let 
the  random  variable  n^j  be  the  number  of  observations  at  having 
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r 


outcome  Sj.  Then  an  unbiased,  minimum  variance,  maximum  likelihood 
estimator  for  ia  ptj  ■  j/N^ . 


We  then  estimate  the  matrix  B  by  B 

A  A  " 

R  are  submatrixes  of  P: 


A  _1A  aa  a 

(I  -  Q)  R  ■  HR,  where  Q  and 


P  - 


.  rt 


Li 


V 


A  A  n 

Since  B  is  a  function  of  the  p^j,  we  can  expand  B  in  a  Taylor 
series  about  (p^j)-  The  p^a  are  not  all  independent  variables, 

however,  since  p^  ■  1  for  all  i.  If,  for  each  i,  we 

arbitrarily  let  Pj,e  R  be  the  dependent  variable  such 

that  p^  ■  1  -  23  p^j  ,  then  the  for  jfd  are  independent 

variables.  Hence  $  can  be  written 


■  B  +  5*  -55-  (p  -  p  )  +  higher  order  terms 


where  3$/9p  is  understood  to  be  evaluated  at  *  pjj,  i  is  an 
element  of  the  index  set  of  transient  states  and  j  is  an  element  of 
the  index  set  of  transient  and  absorbing  states.* 

If  we  ignore  the  higher  order  terms  we  have 

A 


A 

B  -  B 


;IEJL 

1  J  *ft 


<3iJ  '  *ij> 


and 


var(B)  -  E[(B  -  B) 


(2)1  i 


Eli. 

1  ^  # 

pij 


(2) 


(2) 


*  Aa  we  shall  see  later,  dB/dp  ,  S  0,  hence  we  can  actually  sum  over 
all  j  without  difficulty.  1j 
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where  (B  -  ■  (t  -  B)*(B  -  B)  denotes  the  congruent  product*  of 

(&  -  B)  with  Itself. 

Expanding  (2)  yields 


e  -  p^Xp,*  -  p„*» 


EEEE  _sL.  JL 

1  j  m  k  -a  vj 

^mk 


.a 

cov(p 


ij’  ^mk 


a  \ 


E  EE 

i  j  k 


SB 

3p 


3B 


ap 


Pij(5jk  " 

N, 


lk 


where  6  is  the  Kronecker  delta  and  Is  the  sample  size  at  state  S^. 

A  A 

For  convenience,  let  a  ■  3B/3p  and  let  ^  be  the  vector  of 
matrices  ^  •  (a^)  ■  (an»  *  •  *  »ain)*  Also,  let  denote  the  row 
of  P. 

A 

Then  var(B)  can  be  written  In  matrix  notation  as 


var(B) 


E  i_  (A|c»P|t  -  ^,^,(2), 


1  N. 


(3) 


T  T  >  (I'i  C2-) 

where  P^  Is  the  transpose  of  Pif  A^^  -  **  a^p^,  and  A^  ■  (a^  ). 


*  The  congruent  product  is  defined  in  general  for  a  matrix  A  *  (a. .)  as 
(2)  2 

A  *  A  ■  A'  '  “  (a^j).  Thus  congruent  matrix  multiplication  is  the 


multiplication  of  corresponding  elements.  More  generally,  the  congruent 
product  of  two  commensurate  matric 
by  A«B  ■  (a^jbj^j)  for  all  i  and  j. 


product  of  two  commensurate  matrices  A  ■  (a^)  and  b  ■  (b^j)  is  defined 
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The  Derivative  of  B 


A  A 

The  expression  for  var(B)  contains  the  derivative  of  B,  which  is 


afi 


A 

-  H 


ij 


JS-S 


A 

&R 


a$ 


ij 


a$ 


ij 


A  A 

By  definition,  Q  ■  Cprc)  when  r  and  c  are  both  elements  of  the 

A  A 

index  set  of  transient  states  and  R  «  (p  )  when  r  is  an  element  of 
the  index  set  of  transient  states  but  c  is  an  element  of  the  index  set 
of  absorbing  states. 


Therefore, 


_29_ 


a? 


ij 


0 

(6 


ri6cj) 


A 

an 


ap 


ij 


Hence 


(6ri<6c;|  -  6cd»  *  £  * 

0  •  Ptj  *  Q 


as 


ap 


ij 


ArjV 


«‘tl6ed>  '  » 


ij  E  l 


(4) 


.  A  *  A 

As  noted  previously,  when  p^  e  R  and  j  -  d,  BB/Sp^  =  0 
Confidence  Intervals 

A  A 

To  estimate  confidence  intervals  for  B  we  assume  that  B  is 
normally  distributed  with  mean  B  and  variance  var(B).  Then  an 
approximate  1  -  a  confidence  interval  for  B  is 

(B*  -  Za/2(var*(B))(1/2)  ,  B*  +  Z(z/2(var*(B))(1/2))  , 

A  A  A 

where  B*  and  var*(B)  are  the  sample  values  of  B  and  var(B), 
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(var(B))^1^  is  the  matrix  such  that  [(var(B))^1^]  ^  -  var(B), 


and 


fm  J_  e 

*'z«/2  fm 


-x2/2 


dx  “  a/2 


OPTIMAL  SAMPLE  SIZES 

Now  that  we  have  an  estimate  of  a  confidence  interval  for  B,  we 
can  estimate  the  sample  sizes  at  each  transient  state  S^  that  will 
produce  a  confidence  interval  for  a  particular  b^j  at  minimum  cost. 

Let  W^j  be  the  desired  width  of  the  confidence  interval  for  bjj 
and  let 


G  -  J  -  (K  pV2>1  • 

GiIJ  i  Ai  Pi  *AiPi)  I  eJ 

where  and  ej  are  the  unit  column  vectors  e^  ■  ( 6^ j )  and  ej  - 
Then  we  can  write  W^j  as 


Let  C^(N^)  be  the  cost  of  collecting  a  sample  of  size  at 
transient  state  Sj.  We  assume  C ^  has  the  form  ci(Ni)  “  ^  +  Y1Ni 

We  wish  to  determine  the  values  of  the  such  that  the  total  cost 
of  data  collection  will  be  minimized  and  the  desired  confidence  Interval 
width  will  be  obtained.  Thus,  our  problem  is  to 

minimize  £c  (N  )  -  £  (P .  +  y.  N  )  (5) 

i  i  1  1  x 

subject  to 
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The  solution  to 
Lagrange  multipliers) 


SENSITIVITY  ANALYSIS 


this  constrained  optimization  problem 
* 

yields  ,  the  optimal  value  of  N^: 

1/2 


ilJ 


£ 

j 


Gju  yj 


(using 


Two  of  our  evaluation  objectives  are  to  determine  the  sensitivity 
of  the  percentage  of  successful  transmissions  to  changes  in  individual 
error/loss  rates  (question  4)  and  the  sensitivity  of  the  total 
transmission  time  to  changes  in  the  individual  sojourn  times 
(question  5) • 


If  y  ■  f  (x^, . . .  .Xq)  ,  then  the  sensitivity  of  y  to  x^^ 


Sy  *i 

m  - . 

a»1  y 

change  in  y  that  results  from  a  1  percent  change  in  x^. 


is  F  •  -^4 - —  .  The  sensitivity  of  y  to  x,  is  the  percentage 

y*t  y 


Sensitivity  to  Errors  and  Losses 


In  our  evaluation  we  want  to  determine  the  sensitivity  of  b^ 
(the  probability  of  successful  transmission  from  source  (state  S^)  to 
destination  (state  S^))  to  some  error/loss  transition  probability 

PXJ  e  R. 

Using  the  expression  for  the  derivative  of  B  (equation  (4)),  we 

have 


blkPlJ 


Thus  the  sensitivity  is  directly  proportional  not  only  to  p^j,  but 
also  to  h^t  the  expected  number  of  times  the  process  is  In  state  S^, 
given  that  it  began  in  state  >  The  negative  sign  Indicates  that  an 
Increase  in  either  h^j  or  pjj  results  in  a  decrease  in  b^. 
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Sensitivity  to  Delays 

We  also  want  to  determine  the  sensitivity  of  v^  (the  expected  time 
from  source  (state  S^)  to  destination  (state  S^))  to  the  expected  delay 
time  tj  at  transient  state  Sj. 

From  equation  (1),  V  ■  MT,  so  &v^/&tj  *  m^  . 

Therefore  T  _  -  m.  . 

Vj  lj 

AGGREGATION  OF  STRATIFIED  SEMI-MARKOV  PROCESSES 

When  modeling  the  performance  of  a  DDBS,  it  might  be  necessary  or 
desirable  to  stratify  the  system  into  subsystems  and  model  the 
performance  of  each  of  these  individual  subsystems.  For  example,  in  our 
DDBS  evaluation,  the  flow  of  data  might  be  modeled  for  separate 
geographic  regions  (subsystems)  to  determine  their  performance.  The 
Data  Flow  Model  for  each  region  would  have  the  same  state  space,  but  the 
input  data  to  the  model  might  be  different  for  each  region  because  of 
differences  in  performance.  When  the  above  procedure  is  used,  the 
resulting  semi '•Markov  process  of  each  subsystem  becomes  a  conditional 
3emi-Markov  process  in  that  its  transition  probabilities  and  times  are 
relative  to  its  associated  subsystem. 

In  this  section  we  derive  the  method  for  aggregating  these 
separate  conditional  processes  (each  of  which  has  the  same  state  space) 
into  a  single  (unconditional)  process  that  is  representative  of  the 
total  system  and  has  the  same  state  space  as  the  conditional  processes. 


Illustrative  Example 

Figure  2  illustrates  a  simple  four-state  process  ¥  which  has  been 
stratified  into  two  subprocess  ¥^  and  ¥,,  * 

Suppose  data  has  been  collected  for  each  subprocess  and  transition 
probabilities  have  been  estimated  as  shown  in  the  matrices  of  transition 
probabilities  and  P2»  Further,  suppose  we  know  the  fraction  of  time 


a.  SUBPROCESS 


b.  SUBPROCESS  #2 


e.  PROCESS  ♦ 


FOUR -STATE  PROCESS  WITH  TWO  SUBPROCESSES 


(i.e.,  the  probability)  that  the  process  originates  in  each  subprocess, 
say  f^  and  f and  also  know  that  the  process  always  begins  in  state 
.  Given  this  information,  how  do  we  determine  P?  This  example 
illustrates  the  general  problem  addressed  in  this  section.  As  we  shall 
see  later,  what  might  be  considered  as  two  "obvious’*  methods  of 
determining  P  do  not,  in  general,  work:  (1)  aggregating  the  data  from 
the  subprocesses  and  (2)  defining  P  -  f^P^  +  ^2^2’ 

Derivation  of  P 

Suppose  the  process  7  under  study  can  be  stratified  into  m 

subprocesses  7^,  k  e  (1,  2 . .  each  with  the  same  state  space.  We 

assume  that  7  and  all  the  7^  are  being  modeled  as  an  irreducible 
absorbing  semi -Markov  process  with  q  transient  states  and  r 
absorbing  states.  We  also  assume  chat  the  matrix  Pk  of  transition 
probabilities  for  subprocess  7^  can  be  partitioned  into  submatrices  Q^, 
Rk,  0,  and  I  in  the  same  manner  that  P  has  been  partitioned  earlier  in 
the  paper. 

As  shown  earlier,  the  probability  that  7^  is  absorbed  in  Sj,  given 
Chat  the  process  began  in  transient  state  S^,  is 

\  “  ^ijk5  "  (I  ~  Ok5  lRk  “  W 

He  assume,  without  loss  of  generality,  that  7^  always  begins  in  a 
particular  transient  state,  say  S^.  Then  the  probability 

T  T 

terminates  in  S.  is  b,t  ■  e,B,  ■  e.H,  R,  ,  where  e,  is  the  unit 
J  lk  lk  1  k  k  1 

column  vector  e^  ■  (6^). 

If  we  let  fk  be  the  probability  that  the  process  originates 
in  \  (£fk  ■  1),  then  the  probability  that  the  process  terminates  in  Sj 
is 


th“  \ 


0  -(?  fk“?k  )  'l  ?  V4\  - 

and 

*  •  (?  ‘A )  'l  ?  A\  ■ 

PRACTICAL  EXPERIENCE  WITH  THE  MODELS 

The  DaCa  Flow  Model  described  in  this  paper  has  been  used  In  three 
separate  evaluations  of  the  performance  of  distributed  data  base  systems 
C [ 1 ] ,  [3],  and  [4]).  The  methods  for  estimating  confidence  intervals, 
for  estimating  sensitivity,  and  for  aggregating  stratified  semi-Markov 
processes  were  also  used  in  [1].  The  system  under  study  in  [1]  was 
stratified  into  five  subsystems  by  geographic  location.  The  Data  Flow 
Model  for  each  subsystem  comprised  57  transient  states  and  32  absorbing 
states.  Data  for  estimating  transition  probabilities  and  times  were 
collected  by  a  combination  of  automated  and  manual  methods.  Sample 
sizes  were  determined  heuristically  rather  than  by  the  method  described 
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In  this  paper,  which  was  not  developed  until  after  the  completion 
of  [1]. 

CONCLUSION 


This  paper  has  presented  a  framework  and  a  set  of  mathematical 
methods  for  modeling  the  performance  of  a  distributed  data  base  system, 
where  performance  is  measured  by  the  speed  and  accuracy  with  which  data 
are  transmitted  from  one  location  to  another.  A  semi-Markov  Data  Flow 
Model  was  described  that  provides  estimates  of  the  temporal  and  spatial 
distribution  of  data  that  are  to  be  transmitted  from  a  source  to  a 
destination.  Methods  were  presented  for  estimating  confidence  intervals 
for  model  outputs  and  for  estimating  optimal  sample  sizes  for  model 
inputs.  Techniques  were  derived  for  determining  the  sensitivity  of 
model  outputs  to  model  parameters.  A  method  was  presented  for 
aggregating  a  set  of  stratified  semi-Markov  Data  Flow  Models  into  a 
single  model. 
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